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Abstract 

Background: SARS coronavirus (SARS-CoV) was identified as the etiological agent of SARS, and 
extensive investigations indicated that it originated from an animal source (probably bats) and was 
recently introduced into the human population via wildlife animals from wet markets in southern 
China. Previous studies revealed that the spike (S) protein of SARS had experienced adaptive 
evolution, but whether other functional proteins of SARS have undergone adaptive evolution is not 
known. 

Results: We employed several methods to investigate selective pressure among different SARS- 
CoV groups representing different epidemic periods and hosts. Our results suggest that most 
functional proteins of SARS-CoV have experienced a stepwise adaptive evolutionary pathway. 
Similar to previous studies, the spike protein underwent strong positive selection in the early and 
middle phases, and became stabilized in the late phase. In addition, the replicase experienced 
positive selection only in human patients, whereas assembly proteins experienced positive 
selection mainly in the middle and late phases. No positive selection was found in any proteins of 
bat SARS-like-CoV. Furthermore, specific amino acid sites that may be the targets of positive 
selection in each group are identified. 

Conclusion: This extensive evolutionary analysis revealed the stepwise evolution of different 
functional proteins of SARS-CoVs at different epidemic stages and different hosts. These results 
support the hypothesis that SARS-CoV originated from bats and that the spill over into civets and 
humans were more recent events. 


Background 

Severe acute respiratory syndrome (SARS) emerged in 
Guangdong province of China in November 2002 and 
subsequently spread rapidly to 25 countries across five 


continents within 3-4 months [1], Soon after its first out¬ 
break, the etiological agent of SARS was identified as a 
novel coronavirus [2-4], and its complete genome 
sequence was determined [3,5,6], The identification of 
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SARS-CoV in Himalayan palm civets and raccoon dogs in 
live animal markets in Guangdong, China, provided the 
first clue of an animal-to-human transmission [7,8], Fur¬ 
ther studies indicated that civets were unlikely to be the 
natural reservoir [9]. Instead the detection of different 
SARS-like-CoVs in horseshoe bats (Rhinolophus spp.) 
seemed to suggest that bats might be the natural reservoir 
of SARS-CoV and many other closely related coronavi- 
ruses [10-13]. 

Like other coronaviruses, SARS-CoV is an enveloped, pos¬ 
itive-stranded RNA vims with a genome of approximately 
29,700 nucleotides. The genome contains at least 14 open 
reading frames (ORFs) that encode 28 proteins in three 
distinct classes: two large polyproteins Pla and Plab that 
are cleaved into 16 non-structural proteins (nspl-nspl6) 
during viral RNA synthesis; four structural proteins (S, E, 
M and N) that are essential for viral entry and assembly; 
and eight accessory proteins that are believed to be non- 
essential for viral replication, but may facilitate viral 
assembly and play a role in viral virulence and pathogen¬ 
esis (Figure 1) [14-17], 

Similar to all RNA viruses, SARS-CoV replication is associ¬ 
ated with genomic and antigenic variation. The to ratio 
(i d N /d s ratio of non-synonymous to synonymous substitu¬ 
tions) can measure the selective pressure at protein level, 
with co = 1, < 1, > 1 indicating neutral selection, negative 
selection and positive selection, respectively [18,19], Pre¬ 
vious studies have suggested that the S protein of SARS- 
CoV experienced positive selection during SARS epidemic 
[20-22], However, these studies did not find or did not 
analyze for positive selection among the replicase or 


accessory proteins, which may be equally important for 
SARS-CoV's adaptation to a new host. In order to system¬ 
atically investigate the adaptive evolutionary process of 
SARS-CoVs, we employed the branch-site model to ana¬ 
lyze the selective pressures that may act upon some key 
SARS-CoV functional proteins involved in vims entry, rep¬ 
lication and assembly. Our results suggest that diversified 
selective forces act upon different proteins and during dif¬ 
ferent epidemic phases. 

Methods 
Sequence data 

A total of 156 sequences of SARS-CoVs or bat SARS-like- 
CoVs were retrieved from GenBank (129 complete 
genomes and 27 partial genomes) (see additional file 1). 
Based on these sequences, three datasets were constmcted. 
Dataset 1 contains all Spike genes. Dataset 2 is a merged 
dataset that includes sequences of 4 main replicase 
domains of SARS-CoV: papain-like protease (PLpro), SC- 
like protease (3CLpro), RNA dependent RNA polymerase 
(RdRp) and Helicase (ITel). Dataset 3 is a merged dataset 
that includes sequences of 7 ORFs: ORF3a, E, M, ORF6, 
ORF7a, ORF7b and N genes. 

These protein-coding sequences are aligned based on 
translated protein sequences using Clustal W program 
implemented in BioEdit [23,24]. Prior to analysis all 
sequences that were identical to another within the data¬ 
set were removed, since previous studies have shown to 
have little effect on the detection of positive selection and 
contribute little evolutionary information [25], Align¬ 
ment gaps were manually removed based on the reference 
sequence of 31-HP03L_Tor2 (NC_004718). 



Figure I 

Schematic diagram of the SARS-CoV genome organization and viral proteins. The protein coding regions analyzed 
in this paper were shaded in grey. 
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The final composition of each dataset is as follows: dataset 
1 contains 3765 bp of 59 S gene sequences; dataset 2 
includes 35 sequences of replicase domains, 6435 bp in 
total (945 bp for PLpro, 918 bp for 3CLpro, 2769 bp for 
RdRp, 1803 bp for Hel) [17,26-28]; and dataset 3 contains 
56 combined sequences, 3666 bp in total (822 bp for 
ORF3a, 228 bp for E, 663 bp for M, 189 bp for ORF6, 366 
bp for ORF7a, 132 bp for ORF7b and 1266 bp for N). 

Phylogenetic analysis and reclassification of SARS-CoVs 

For each dataset, a phylogenetic tree was built with MrBayes 
3.1.2 (1,000,000 generations, sampled every 100 genera¬ 
tions, burnin = 500, 4 chains) [29], The tree topologies pre¬ 
sented in figures 2, 3, 4 were used for different models. In 
previous studies, SARS-CoV isolates have been divided into 
five groups: 02-03 palm civets, 02-03 early, middle, late 
human patients, and 03-04 civet and human [20,21], In 
the current study, we included an additional group contain¬ 
ing the bat SARS-like-CoVs. Based on tree topologies and 
epidemiological information, we reclassified each dataset, 
such as to enable us to realistically investigate the adaptive 
evolution of SARS-CoVs in different hosts and during dif¬ 
ferent epidemic periods. As showed in figures 2, 3, 4, the 
following groups were established: the BSL group, repre¬ 
senting bat SARS-like-CoVs; the PC03 group, representing 
isolates from palm civets in 2003; the F1PEM group, repre¬ 
senting human patient isolates during early and middle 
epidemic phases in 2002-03; the FIPL group, representing 
human patient isolates during late epidemic phase in 2003; 
the PCHP04 group, representing civet and human 
sequences from the 2003-04 epidemic phase; the HP03 
group, representing all isolates collected from human 
patients during the epidemic period of 2002-03; and the 
HPML group, representing human patient isolates collected 
during the middle and late epidemic phases in 2003; and 
finally, the SARS group, representing all isolates collected 
from civets and human patients in 2002-04. 

Detection of recombination and positive selection 

Since recombination can influence the detection of posi¬ 
tive selection, we first tested for recombination in our data 
sets by using a genetic algorithm for recombination detec¬ 
tion (CARD) [30], Identified breakpoints by CARD were 
then assessed for significance by using Kishino-FIasegawa 
(KIT) test in HYP1TY package [31]. Since most sequences 
in SARS group have high similarity and increasing the 
number of sequences may dilute the signal, for each data¬ 
set, we choose 10 sequences for CARD analysis (four from 
BSL group: 124-Bat_SARS-273, 125-Bat_SARS-279, 126- 
Bat_SARS-ITKll3, 127-Bat_SARS-Rp3; six from SARS 
group: 3-FIP03E_GZ02, 15-HP03M_BJ02, 31- 

HP03L_Tor2, 106-HP04_GZ0402, 110-PC04_PC4-136, 
130-PC03_SZ13). 

To test for diversifying selection and to infer codon sites 
under positive selection, we mainly used CODEML pro¬ 


gram in the PAML 4.1 software package, which is based on 
the maximum likelihood algorithm of Yang and cowork¬ 
ers [32], Three kinds of models (branch-specific, site-spe¬ 
cific and branch-site) were employed to detect selective 
pressure among different branches and at different sites. 
The likelihood ratio test (LRT) was used to investigate 
whether the null hypothesis, where no positive selection 
is allowed, can be rejected against the alternative hypoth¬ 
esis, where positive selection is allowed [32], The one 
ratio model (M0) assumes the same cd ratio for all 
branches and sites in the phylogeny. The free-ratio (FR) 
model assumes an independent cd ratio for each branch in 
the phylogeny. M0 and FR can be compared using LRT to 
examine whether cd ratios are different among lineages. 
The discrete model (M3) estimates cd for three classes of 
codon. Comparing M0 and M3 can test the variability of 
selective pressure among sites. When evidence for positive 
selection (cd > 1) was detected, the naive empirical Bayes 
(NEB) method was used to calculate posterior probabili¬ 
ties for site classes. A higher posterior probability suggests 
strong support for a site to be under positive selection. In 
brief, the branch-specific model assumes variation among 
branches, but not among sites; the site-specific model 
assumes variation of selective pressure among sites, but 
not among branches. Both models are widely used to 
investigate selective pressure. Llowever, if adaptive evolu¬ 
tion occurs at a few time points and affects a few amino 
acids, these two models might lack power in detecting 
positive selection. To overcome this limitation, we also 
used branch-site model, which assumes that the cd ratio 
varies both among sites and among lineages [33,34], In 
the branch-site model A (model A), the lineages of interest 
are set to be foreground, and the other lineages to be back¬ 
ground. Selective constrains are assumed to vary across 
sites both along foreground and background, and a small 
fraction of sites only vary along foreground lineages. 
There are 3 cd ratios for foreground (0 < cd 0 < 1, CDj = 1, cd 2 
>1) and 2 cd ratios for background (0 < cd 0 < 1, cd : = 1) in 
branch-site model A. When evidence for positive selection 
(cd > 1) was detected, both naive empirical Bayes and 
Bayes empirical Bayes (BEB) were used to calculate poste¬ 
rior probabilities for site classes. Since NEB does not 
account for sampling errors, we used the BEB outputs as 
suggested by Yang [35], The null model (model A 1 ) is 
same as model A, but cd 2 = 1 is fixed. Branch-site model 
tends to be the most powerful of the three tests. In order 
to investigate the variation of selective pressure in differ¬ 
ent epidemic periods and hosts, we set each group of 
SARS-CoVs as foreground in turn to implement branch- 
site model. However, in such multiple tests, the probabil¬ 
ity of false rejection of at least one null hypothesis can be 
high. So we used Bonferroni correction to control the false 
positive rate, as it has been shown to be powerful when 
applied to the branch-site test [36], As to dataset 1 and 3, 
we applied branch-site model to 6 groups on the tree, and 
for dataset 2, we applied branch-site model to 5 groups. 
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Figure 2 

Phylogenetic relationships of 59 S gene sequences of SARS-CoVs from human and animals. The tree was gener¬ 
ated with MrBayes 3.1.2 program. Posterior probabilities are shown on the nodes of the tree. Branch between BSL group and 
others was depicted with dotted line, because the branch was too long to be displayed at same scale. Bar, 0.001 nucleotide 
substitutions per site. 
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Figure 3 

Phylogenetic relationships of 35 replicase domains of SARS-CoVs from human and animals. The tree was gener¬ 
ated with MrBayes 3.1.2 program. Posterior probabilities are shown on the nodes of the tree. Bar, 0.001 nucleotide substitu¬ 
tions per site. 


So we used 0.0083 as the significance level for each 
branch-site tests in dataset 1 and 3, used 0.01 as the signif¬ 
icance level for dataset 2. As indicated previously by Yang 
[33], these models sometimes fail to converge to maxi¬ 
mum likelihood estimates. We thus performed each anal¬ 
ysis at least twice using different starting values. Only 
identical data produced from both runs were considered 
reliable. All data are available upon request. 

In order to examine the robustness of those positive selec¬ 
tions identified by PAML, we also analyzed our datasets 


using HYPHY package accessed through the Datamonkey 
facility http://www.datamonkey.org [37]. Datamonkey 
includes three methods for detecting sites under selection: 
single likelihood ancestor counting (SLAC), fixed effects 
likelihood (FEL) and random effects likelihood (REL). 
REL method is often the only method that can infer selec¬ 
tion from small (5-15 sequences) or low divergence 
alignments and tends to be the most powerful of the three 
tests. So this method was run using the GTR substitution 
model on a neighbor-joining phylogenetic tree by the 
Datamonkey web server. In order to investigate selective 
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Figure 4 

Phylogenetic relationships of 56 3'-end ORF sequences of SARS-CoVs from human and animals. The tree was 
generated with MrBayes 3.1.2 program. Posterior probabilities are shown on the nodes of the tree. Bar, 0.001 nucleotide sub¬ 
stitutions per site. 
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pressure among different hosts and epidemic phases, we 
split each dataset (S protein, replicase domains, 3'-end 
ORFs) into appropriate groups for analysis. 

Results 

Phylogenetic analysis 

For all genes analyzed, represented by S, replicase and 3'- 
end ORFs gene trees, at least four groups are apparent: BSL, 
PC03, HP03, PCF1P04. As to the F1P03 group, it can be sub¬ 
grouped into HPEM and 11 PI. in S gene tree, and HPEM and 
HPML in 3'-end ORF tree. It should be noted that the pos¬ 
terior probabilities for several nodes are low and there are 
some polytomies. These uncertainties could be due to some 
sequences in SARS group have high amino acid similarity, 
especially for replicase and 3'-end ORFs which are more 
conservative. Flowever, previous studies suggested that the 
LRTs and qualitative results of ML parameter estimation 
were rather insensitive to tree topology [35,38-42], For 
branch-site model, Bayesian site identification might be 
affected by tree topology [40], Remarkably, one isolate (38- 
HP03L_GD01), which was isolated in the later epidemic 
phase in 2003, always clustered with the early phase iso¬ 
lates. A possible explanation could be that this patient was 
infected in the early epidemic phase, which is supported by 
sequence analysis; this isolate has 29 extra nucleotides 
between ORF8a and ORF8b, a feature shared among iso¬ 
lates from civets and early phase patients. Another isolate 
(5-HP03E_HGZ8Ll-B), which was isolated in the early epi¬ 
demic phase, tends to cluster with the middle phase isolates 
(Fig 2 and 4). This virus may be a transitional vims because 
it does not have the extra 29 nucleotides like the middle 
phase isolates. 


Detection of recombination 

As showed in table 1 and additional file 2, CARD detected 
9 breakpoints in dataset 1, and KH test indicated that 1 
breakpoint (2301) was significant at p-value < 0.01 level. 
For dataset 2 and 3, GARD detected 2 and 5 breakpoints 
respectively, but none of them was significant after KH 
test. 

Positive selection on the S protein of SARS-CoV 

We analyzed the entire S gene of 156 isolates. Because sev¬ 
eral isolates were identical at the amino acid sequence 
level, we eliminated them from the dataset since previous 
analyses indicated that contribute limited evolutionary 
information [19,25]. Therefore, 59 sequences were com¬ 
piled into dataset 1. Table 2 presents the analysis results of 
dataset 1. The analyses of branch-specific model (FR) 
indicate that selective pressure varied along branches. 
Many branches in the HPEM and PCHP04 groups clearly 
have higher co ratios. The LRT statistic for comparing M0 
and FR is significant, which confirm the heterogeneous 
selective pressure along branches. According to the site- 
specific model (M3), 1.3% sites among S protein are 
under positive selection with co = 3.214. Furthermore, this 
model identifies 9 sites under positive selection at poste¬ 
rior probability p > 90% level (Table 2). All these sites are 
distributed within the SI domain. 

The results of branch-site model revealed that no evidence 
of positive selection was found in the BSL, PC03 and HPL 
groups. For the HPEM group, the results indicated that 
3.2% sites of S gene are subjected to strong positive selec¬ 
tion with co = 28.756. At p > 90% level, 14 specific sites 


Table I: KH tests verify the significance of breakpoints estimated by GARD analysis 






p-value 

Dataset 

Number of breakpoints 

AIC c improvement 

Breakpoint location 

LHS 

RHS 

Spike 

9 

588.485 

776 

1.000 

0.220 




933 

0.059 

0.427 




1257 

1.000 

0.464 




1485 

1.000 

1.000 




2067 

1.000 

0.670 




2301 

0.002 

0.002 




2592 

0.893 

0.061 




2916 

0.085 

0.988 




3501 

1.000 

0.659 

Replicase 

2 

30.414 

1230 

1.000 

0.507 




4398 

1.000 

0.040 

3'-end ORFs 

5 

254.203 

454 

0.363 

0.306 




729 

0.210 

0.001 




1091 

0.010 

0.078 




1927 

0.254 

0.005 




3321 

1.000 

0.353 


KH test was used in both directions to compare phylogenies constructed from the alignment segment to the left hand side (LHS) and right hand 
side (RHS) of each estimated breakpoint. All p-values have been adjusted by Bonferroni correction. 
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Table 2: Maximum likelihood (ML) estimates for 59 S genes of SARS-CoV 


Models 

d.f. 

Parameters 

Parameters 

lnL 0 

2AI 

P-value 

Positively 



under 

under 

(InL,) 



selected 



null 

alternative 




sites* 



model 

model 





Branch Model 

1 14 

M0 (one ratio) 

Free Ratio 

-12834.1 10 

355.006 

< 0.001 

Not allowed 

M0 vs. FR 


co = 0.081 

co = 0~CO 

(-12656.604) 




Site Model 

4 

M0 (one ratio) 

M3 (discrete, K = 3) 

-12834.1 10 

436.204 

< 0.001 

142,31 1,430, 

M0 vs. M3 


co = 0.081 

p 0 = 0.732, co 0 = 0.015 

(-12616.008) 



462 , 479 , 540 




pi = 0.255, to| = 0.285 
p 2 = 0.013, co 2 = 3.214 




609, 626 , 665 

Branch-site model A 







BSL group 

1 

MA' (fix a> 2 = 1) 

MA 

-12661.687 

0 

1.000 

None 

as foreground 


p 0 = 0.912, co 0 = 0.047 

p 0 = 0.912, co 0 = 0.047 

(-12661.687) 




MA' vs. MA 


pi = 0.088 

pi = 0.088, co 2 = 1.000 







(P2a + P2b=0) 

(p2a + p2b = 0) 





PC03 group 

1 

MA' (fix co 2 = 1) 

MA 

-12658.258 

0.024 

0.877 

None 

as foreground 


p 0 = 0.707, co 0 = 0.046 

p 0 = 0.782, co 0 = 0.046 

(-12658.246) 




MA' vs. MA 


p, = 0.069 

p, = 0.076, co 2 = 1.592 







(p2a + p2b = 0.224) 

(P2a + P2b= 1 42) 





HPEM group 

1 

MA' (fix co 2 = 1) 

MA 

-12646.1 15 

15.572 

<0.001 

49, 75, 344, 

as foreground 


p 0 = 0.587, co 0 = 0.044 

p 0 = 0.885, co 0 = 0.045 

(-12638.329) 



360, 501, 778 

MA' vs. MA 


p, = 0.055 

pi = 0.083, co 2 = 28.756 




794, 860, 861 



(P2a + P2b= °- 358 ) 

(P2a + P2b = ° 032 ) 




1001, 1 148, 1 163 

1 179, 124 

HPL group 

1 

MA' (fix co 2 = 1) 

MA 

-12650.732 

0 

1.000 


as foreground 


p 0 = 0.400, co 0 = 0.045 

p 0 = 0.400, co 0 = 0.045 

(-12650.732) 




MA' vs. MA 


p,= 0.038 

pi= 0.038, co 2 = 1.000 







(P2a + P2b = °- 562 ) 

(P2a + P2b = °- 562 ) 





PCHP04 group 

1 

MA' (fix co 2 = 1) 

MA 

-12626,601 

1 13.802 

< 0.001 

78, 91, 108 , 

as foreground 


p 0 = 0.718, co 0 = 0.045 

p 0 = 0.901, co 0 = 0.045 

(-12569.700) 



1 13, 147 , 227 , 

MA' vs. MA 


p,= 0.057 

p,= 0.072, co 2 = 57.205 




243, 425, 440, 



(P2a + P2b = °- 225 ) 

(P2a + P2b = 0.027) 




462, 479, 609, 

613, 632, 743, 

765, 839, 844, 

856, 900, 1052 

1080 

SARS group 

1 

MA' (fix co 2 = 1) 

MA 

-12498.107 

19.274 

<0.001 

2, 7, 9, 12, 14, 20, 

as foreground 


p 0 = 0.753, co 0 = 0.024 

p 0 = 0.792, co 0 = 0.027 

(-12488.470) 



27,33,37, 43, 58 , 68 , 

MA' vs. MA 


p,= 0.035 

p,= 0.034, co 2 = 1.989 




70, 75, 84, 107, 108, 



(P2a + P2b= 0-212) 

(p2a + p2b = 0.174) 




131, 134, 137, 139 

147, 151, 154, 163, 165, 

167 , 169 , 174 , 199 , 201 , 

214 , 227 , 230 , 237 , 239 

242, 243, 244, 248, 249, 

294, 333 , 336 , 344 , 353 , 
391 , 392 , 426 , 431, 

440 , 442 , 457 , 459, 462 , 

479, 480 , 487, 488 , 494, 607, 
613 , 644, 729 , 732 , 743 , 

754, 758 , 765 , 778 , 

1052 , 1080 , 1148, 1163 


* Positively selected sites are identified with posterior probability p > 90%. In boldface, p > 95%. 


were identified as potentially under positive selection 
(Table 2). For the PCHP04 group, 2.7% codon sites of the 
S gene are driven by strong positive selection with co = 
57.205. Twenty two positively selected sites were identi¬ 
fied in this group (p > 90%). Fourteen of them are in SI 
and eight in S2 domain. For the selection of entire SARS- 
CoVs from the two epidemics, the branch-site model A 
analyses indicated that 17.4% sites are under positive 
selection with co = 1.989. A total of 74 sites were identified 


as potentially under positive selection along these line¬ 
ages at 90% cutoff. In order to intuitively represent the 
distribution of these positively selected sites, we con¬ 
structed the additional file 3, from which we can find that 
most of these sites distribute in SI domain. 

HYPHY package analysis accessed through Datamonkey 
facility also detected positive selection in F1PEM and 
PCHP04 groups (with d N -d s = 0.061 and 0.938 respec- 
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tively), but not in BSL, PC03 and HPL groups. As indi¬ 
cated in table 3, we also identified some positively 
selected sites, most of which are identical to those identi¬ 
fied by the branch-site model A. 

Positive selection on replicase domains of SARS-CoV 

PLpro, 3CLpro, RdRp and Hel are the major domains for coro- 
navirus replication [43,44], We merged these four domains 
into one supergene for analysis because: 1) Yang et al. reported 
that gene concatenating analysis produced same outcomes as 
those obtained from analysis of separate genes [42]; 2) sepa¬ 
rate analysis results in mechanical repeats; 3) concatenating 
analysis can provide additional information because of addi¬ 
tional number of sequences for the merged dataset, compared 
to separate dataset. Therefore, dataset 2 consists of 35 concate¬ 
nated sequences from 129 complete genomes. 

As presented in table 4, the results of branch model analysis 
reveal that the to ratio varies from 0 to infinite along differ¬ 
ent branches. This implies that selective pressures among 
these domains vary in different hosts and at different epi¬ 
demic phases, though these domains are the most con¬ 
served regions in CoV. Analysis using the discrete model 
(M3) detected no sign of positive selection in the dataset 2, 
although it suggests that the to ratios vary significantly 
among different amino acid sites as indicated by LRT. 


Utilizing the branch-site model A analysis indicated that 
there is no positive selection in the BSL, PC03 and 
PCHP04 groups. However, the model A analysis revealed 
that among HP03 group about 8.1% codon sites of these 

4 domains are potentially under strong positive selection 
with co = 11.093 and 28 sites were identified (7 in PLpro, 

5 in 3CLpro, 7 in RdRp, 9 in HEL). Weak positive selection 
(d N -d s = 0.001) was also detected from HP03 group by 
using HYPHY package but not other groups (Table 3). 

Positive selection on 3'-end ORFs of SARS-CoV 

The 3'-end of SARS-CoV genome encodes 11 ORFs: 
ORF3a, ORF3b, ORF4 (E), ORF5 (M), ORF6, ORF7a, 
ORF7b, ORF8a, ORF8b, ORF9a (N), and ORF9b. The E, 
M, N proteins are structural proteins of SARS-CoV and the 
other proteins are accessory proteins. Because the coding 
regions of ORF3b and ORF9b overlap partially or com¬ 
pletely with those of ORF3a and N, we excluded the 
ORF3b and the ORF9b from this analysis. The ORF8a and 
ORFSb are present as two separate ORFs in most human 
isolates but as a single ORF (ORF8ab) in isolates from ani¬ 
mals and early phase human due to the presence of extra 
29 nt in this region, thus resulting in the fusion of ORF8a 
and ORFSb. Because of the difficulty in obtaining a relia¬ 
ble alignment in this region, ORF8 (a, b or ab) was 
excluded from our analysis as well. For similar reasons as 


Table 3: REL analysis results for three datasets 


Groups 3 

No. of sequences 11 

Mean d N -d s e 

Positively selected site(s) 

Spike 

BAT 

4 

-0.957 


PC03 

3 

-0.904 


HPEM 

14 

0.061 

49, 75, 77, 144, 239, 244, 311, 344, 778, 860, 861, 1001, 1148, 1163, 1179, 1247 

HPL 

11 

-0.138 


PCHP04 

27 

0.938 


SARS b 

40 

0.361 

75, 147, 227, 239, 243, 244, 31 1,462, 479, 609, 613, 743, 765, 778, 1080, 1 163 

Replicase 

BAT 

4 

-0.985 


HP03 

21 

0.008 

654 

PCHP04 

8 

-0.774 


SARS 

31 

-0.561 


3'-end ORFs 

BAT 

4 

-0.91 


HPEM 

5 

-0.42 


HPML 

33 

0.152 


PCHP04 

12 

-0.571 


SARS 3 

40 

-0.301 



a. At least 3 sequences are needed for REL analysis, so PC03 groups of dataset 2 and 3 were not analyzed. 

b. The upper limit in number of sequences for REL test is 40, so 15 sequences were removed from original SARS group (removed sequences' 

number: 33,40,43, 50, 86, 108, NO, III, 123, 135, 139, 144, 147, 156) 

c. 12 sequences were removed from original SARS group (removed sequences' number: 5, 19, 50, 56, 57, 81,82, 88, 91, 103, 107, III) 

d. As a rule of thumb, at least 10 sequences are needed to detect selection at a signal site with reliability. So some of the results may be not reliable 
because of not enough sequences are available for some groups. 

e. Because d s could be 0 for some sites, Datamonkey reports d N -d s in place of d N ld s 
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Table 4: Maximum likelihood (ML) estimates for 35 merged replicase genes of SARS-CoV 


Models 

d.f. 

Parameters under 

null 

model 

Parameters 

under 

alternative 

model 

lnL 0 

(InL,) 

2AI 

P-value 

Positively 

selected 

sites* 

Branch Model 
M0 vs. FR 

66 

M0 (one ratio) 
w = 0.024 

Free Ratio 

co = 0~co 

-14460.634 

(-14354.549) 

212.17 

< 0.001 

Not allowed 

Site Model 4 

M0 vs. M3 

Branch-site model A 

M0 (one ratio) 
co = 0.024 

M3 (discrete, K = 3) 
p 0 = 0, co 0 = 0 
p,= 0.972, C0|= 0.016 
(p 2 = 0.028), (o 2 = 0.360 

-14460.634 

(-14447.201) 

26.866 

<0.001 

None 

BSL group 
as foreground 
MA' vs. MA 

1 

MA 1 (fix co 2 = 1) 
p 0 = 0.993, co 0 = 0.020 
p, = 0.007 
(Pla+Plb = °) 

MA 

p 0 = 0.993, co 0 = 0.020 
pi = 0.007, coj = 1.000 
(p2a + p2b = °) 

-14449.205 
(-14449. 205) 

0 

1.000 

None 

PC03 group 
as foreground 
MA' vs. MA 

1 

MA 1 (fix co 2 = 1) 
p 0 = 0.993, co 0 = 0.020 
p, = 0.007 

(p2a + p2b = 0) 

MA 

p 0 = 0.993, co 0 = 0.020 
pi = 0.007, co 2 = 1.000 

(P2a + f>2b=0) 

-14449. 205 
(-14449. 205) 

0 

1.000 

None 

HP03 group 
as foreground 
MA' vs. MA 

1 

MA 1 (fix co 2 = 1 ) 
p 0 = 0.322, co 0 = 0.015 

p, = 0.002 
(p2a + p2b = °- 676 ) 

MA 

p 0 = 0.913, co 0 = 0.015 
p,= 0.006, co 2 = 1 1-093 

(p2a + p2b= 008 1 ) 

-14389.596 

(-14386.122) 

6.948 

0.008 

23, 123,222, 236, 237,250, 

266, 375, 377, 409, 504, 563, 

646, 654, 884, 1234, 1259, 1482, 
1491, 1786, 1866, 1869, 1878, 

1963, 1995, 2010, 2032, 2034 

PCHP04 group 
as foreground 
MA' vs. MA 

1 

MA 1 (fix co 2 = 1) 
p 0 = 0.761, co 0 = 0.018 
p, = 0.006 

(f>2a + f>2b = 0-234) 

MA 

p 0 = 0.760, co 0 = 0.018 
p, = 0.006, co 2 = 1.000 
(P2a + P2b = °- 234 ) 

-14435.921 

(-14435.921) 

0 

1.000 

None 

SARS group 
as foreground 
MA' vs. MA 

1 

MA 1 (fix co 2 = 1) 
p 0 = 0.850, co 0 = 0.012 
p, = 0.005 
(p2a + p2b = 01 45) 

MA 

p 0 = 0.857, co 0 = 0.012 
pi = 0.005, co 2 = 1.061 
(P2a + P2b = 0-138) 

-14405.997 

(-14405.994) 

0.006 

0.938 



* Positively selected sites are identified with posterior probability p > 90%. In boldface, p > 95%. 


mentioned above, we merged the 7 remaining ORFs into 
a supergene for analysis. 

As presented in table 5, the results of FR model analysis 
revealed that selective pressures vary among lineages. The 
results of M3 model also implied variation in selective 
pressure among different amino acid sites. However, the 
M3 model did not detect any sign of positive selection. 
The results of branch-site model A revealed that, except for 
the BSL, PC03 and HPEM groups, the other groups dis¬ 
played positive selection signatures (Table 5). For the 
HPML group, about 12.2% sites of these ORFs were 
shown to be under positive selection with cd = 9.863. 
Twenty five specific sites were identified: 6 in ORF3a (11, 
29, 85 129, 136, 222); 4 in E (279, 280, 304, 319); 9 in M 
(377, 388, 418, 423, 436, 449, 463, 469, 504); 1 in ORF6 
(584); 1 in ORF7a (696); and 4 in N (850, 932, 954, 993). 
When the PCEIP04 group was defined as foreground, the 
branch-site model A analysis estimated that 1.9% sites 
were under positive selection with to = 22.447 and four 
sites were identified to be under positive selection (2 in 
ORF3a, 1 in ORF6, 1 in N). For the whole SARS-CoV col¬ 
lection, the branch-site model A analysis revealed 12.2% 


sites of these ORFs to be under positive selection with co = 
3.138. A total of 17 sites were identified at p > 90% level. 
Among them, 9 are located in ORF3a, 3 in M, 2 in ORF6 
and 3 in N. In addition, a large number of sites were iden¬ 
tified to be potentially under positive selection at p > 70% 
level (see additional file 3B). 

Discussion 

Natural selection generally leads to a reduction in delete¬ 
rious mutations while promoting advantageous muta¬ 
tions. If a gene is highly divergent, there are two main 
explanations: first, it may be due to high mutation rate or 
relaxed selective constraint, in which case the gene may be 
free to mutate mainly because it has no fitness or function; 
or second, due to positive selection which is promoted by 
natural selection and the gene usually has highly impor¬ 
tant functions [45], Virus entry, replication, assembly and 
release are the main steps of viral life cycle. Proteins 
involved in each of these steps may undergo adaptive evo¬ 
lution after a virus invades a new host. 

Recombination and mutation are two important evolu¬ 
tionary mechanisms driving gene diversity and adapta- 
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Table 5: Maximum likelihood (ML) estimates for 56 merged 3'-end ORFs of SARS-CoV 


Models 

d.f. 

Parameters 

under 

null 

model 

Parameters 

under 

alternative 

model 

lnL 0 

(InL,) 

2AI 

P-value 

Positively 

selected 

sites* 

Branch Model 
MO vs. FR 

108 

M0 (one ratio) 
to = 0.170 

Free Ratio 

CD = 0~co 

-9142.692 

(-9055.881) 

173.623 

< 0.001 

Not allowed 

Site Model 4 

MO vs. M3 

Branch-site model A 

M0 (one ratio) 
to = 0.170 

M3 (discrete, K = 3) 
p 0 = 0, co 0 = 0 
p |= 0.866, C0|= 0.058 
(p 2 = 0.134),co 2 = 0.986 

-9142.692 

(-9093.135) 

99.1 14 

< 0.001 

None 

BSL group 
as foreground 
MA' vs. MA 

1 

MA' (fix co 2 = 1) 
p 0 = 0.868, co 0 = 0.059 
p,= 0.132 
(p2a + p2b = °) 

MA 

p 0 = 0.868, co 0 = 0.059 
p, = 0.132, co 2 = 1.000 
(p2a + p2b = °) 

-9093.137 

(-9093.137) 

0 

1.000 

None 

PC03 group 
as foreground 
MA' vs. MA 

1 

MA' (fix co 2 = 1) 
p 0 = 0.868, co 0 = 0.059 
p,= 0.132 
(P2a+P2b = 0) 

MA 

p 0 = 0.868, co 0 = 0.059 
pi= 0.132, co 2 = 1.000 

(P2 a + P2b=0) 

-9093.137 

(-9093.137) 

0 

1.000 

None 

HPEM group 
as foreground 
MA' vs. MA 

1 

MA' (fix co 2 = 1) 
p 0 = 0.855, co 0 = 0.059 
p,= 0.130 
(p2 a + p2b= °- 0IS ) 

MA 

p 0 = 0.861, co 0 = 0.059 
p,= 0.131,03 2 = 4.300 
(P2a + P2b = 0.008) 

-9093.127 

(-9093.088) 

0.078 

0.780 


HPML group 
as foreground 
MA' vs. MA 

1 

MA' (fix C0 2 = 1) 
p 0 = 0.125, co 0 = 0.047 
p,= 0.017 
(p2 a + p2b = °- 858 ) 

MA 

p 0 = 0.772, co 0 = 0.046 
p i = 0.106, co 2 = 9.863 
(p2 a + p2b= °- 122 ) 

-9069.427 

(-9065.120) 

8.614 

0.003 

1 1, 29, 85, 129, 136, 

222, 279, 280, 304, 319, 

377, 388, 418, 423, 436, 

449, 463, 469, 504, 584, 

696, 850, 932, 954, 993 

PCHP04 group 
as foreground 
MA' vs. MA 

1 

MA' (fix C 0 2 = 1) 
p 0 = 0.690, co 0 = 0.055 
p,= 0.097 
(P2a + P2b= °- 213 ) 

MA 

p 0 = 0.862, co 0 = 0.057 
p, = 0.1 19, co 2 = 22.447 
(P2a + P2b= °- 019 ) 

-9087.427 

(-9076.176) 

22.502 

< 0.001 

25, 259, 609, 1 184 

SARS group 
as foreground 
MA' vs. MA 

1 

MA' (fix co 2 = 1) 
p 0 = 0.664, co 0 = 0.033 

p | — 0.066 
(p2a + p2b = °- 270 ) 

MA 

p 0 = 0.804, co 0 = 0.037 
p,= 0.074, co 2 = 3.138 
(p2a + p2b = °- 122 ) 

-9058.932 

(-9051.231) 

15.402 

< 0.001 

II, 15, 81,117, 120, 121, 

171, 193, 259, 355, 361,560, 
609,628, 830, 850, 1184 


* Positively selected sites were identified with posterior probability p > 90%. In boldface, p > 95%. 


tion. Since recombination can affect the detection of 
positive selection, we first tested for recombination in our 
datasets [46]. CARD detected no evidence of recombina¬ 
tion within the replicase and 3'-end ORFs, while one puta¬ 
tive breakpoint in spike protein was detected. Whether 
there is any recombination among SARS-CoV is still 
debatable [13,22,47-49]. Previous studies suggested puta¬ 
tive recombination only when analysis of SARS-CoV 
sequences were put together with other coronaviruses [47- 
49]; however, when analyses were focused solely on SARS- 
CoV sequences, recombination could not be detected 
[13,22], We also tested recombination in SARS group 
alone for each dataset. No evidence of recombination was 
detected by CARD. These results might imply that there 
could be some ancient recombination events occurred 
between bat SARS-like-CoV and the ancestor of SARS- 
CoV, which drove the bat SARS-like-CoV adaption to civet 
and human. Nevertheless, previous studies had revealed 
that detection of positive selection by LRT method was 
robust to low levels of recombination (with fewer than 


three recombination events), and identification of sites 
under positive selection by the empirical Bayes method 
appeared to be less affected than the LRT by recombina¬ 
tion [46]. Overall, the issue of recombination among RNA 
viruses is highly controversial because the putative recom¬ 
bination events described were detected only by utilizing 
computationally-demanding phylogenetic analyses (split 
decomposition and/or maximum likelihood methods). 
Therefore, caution should be used when inferring conclu¬ 
sions about putative recombination events that are based 
only on such analyses. Because viable clonal recombinant 
viruses have been rarely observed in nature, for natural 
recombination leading to the transmission of a recom¬ 
binant strain to be conclusively confirmed, the following 
prerequisites should be met: (i) the recombinant crosso¬ 
ver should be demonstrated in a single PCR amplicon fol¬ 
lowing cloning to ensure it occurs in a single DNA 
molecule; (ii) the recombination should be demonstrated 
repeatedly in clonal populations of viable virus (e.g. a 
plaque harvest or limited endpoint dilution; and (iii) the 
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recombinant should maintain adequate sequence conser¬ 
vation during post-recombination evolution [50], 

The S protein is a structural protein of coronavirus. It has 
a crucial role in the binding of virus to host receptor and 
subsequent fusion between the viral and host mem¬ 
branes, both processes being important for virus entry 
into host cell. In the case of several mammalian and avian 
coronaviruses, the S protein is cleaved into SI and S2. The 
former contains receptor attachment sites and the later is 
involved in the fusion of CoV onto host cell. The SI sub¬ 
unit, which usually has high divergence, contains a recep¬ 
tor binding domain (RBD); the S2 subunit, which is 
comparatively more conserved, contains two heptad 
repeat (HR) domains [51]. Several studies revealed that 
the S gene of SARS experienced noticeable positive selec¬ 
tion during the SARS epidemics, especially in the early 
and middle phases [20-22], However, our analyses indi¬ 
cated that the S protein of SARS-CoV underwent a step¬ 
wise adaptive process subsequent to its spillover into the 
civet and human populations. In the BSL group, our anal¬ 
yses suggest that the bat SARS-like-CoVs experienced puri¬ 
fying selection, indicating that the S gene is relatively 
stable in bats. In palm civet, SARS-CoV experienced strong 
positive selection as indicated by the results of PCHP04 
group. The failure to identify significant positive selection 
in PC03 group was most likely due to the limited number 
of sequences available for analyses (only four sequences 
for PC03 group and two of them have identical S gene 
sequences). During the early and middle phases of the 
2002-2003 SARS epidemic in human population, a small 
fraction of sites among the S protein were under strong 
positive selection. In contrast, isolates from the late phase 
showed no sign of positive selection, implying the S pro¬ 
tein became stable again after two stages of adaptive evo¬ 
lution. Our analysis using the HYPHY package accessed 
through Datamonkey facility also revealed similar evolu¬ 
tionary patterns for the S protein (Table 3). Taken 
together, these results support the hypothesis that SARS- 
CoVs originated in bats, that the spill over into civets and 
humans were recent events and that the two SARS epi¬ 
demics that took place one year apart, were results of inde¬ 
pendent animal-to-human transmissions [7,20]. The 
major sequence difference of the S genes between bat 
SARS-like-CoVs and civet/human SARS-CoVs suggests 
that there may be other more closely related SARS-CoVs in 
bats or there may be other unknown intermediate animal 
host(s) in the transmission of the virus(es) from bats to 
civets [49], 

Among the SARS-CoVs from human and palm civets, 
numerous sites are identified to be potentially under pos¬ 
itive selection (Table 2 and see additional file 3A). Those 
sites inferred from different SARS epidemic phases reflect 
the adaptation process of SARS-CoVs. Those sites identi¬ 
fied from the PCHP04 group may be important for SARS- 


CoV adaptation to palm civets. Those sites identified from 
the HPEM group may be important for SARS-CoV adapta¬ 
tion to human. Most of these sites, especially from the 
entire SARS group, are located in the SI domain that con¬ 
tains the receptor binding domain. Zhang et al. [22] pre¬ 
viously identified 12 positively selected sites in the SARS- 
CoV group, all of which were confirmed in our current 
study. The greater number of sites identified in our study 
is likely due to the fact that the branch-site model is more 
powerful than the site-specific model. Some of these sites 
have been confirmed by experimental data to be crucial 
for the adaption of SARS-CoVs to human. For example, it 
has been found that adaptation of S protein to human 
angiotensin converting enzyme 2 (ACE2) is facilitated by 
alteration of residue 479 to asparagine and of 487 to thre¬ 
onine [52,53]. Also, using site-directed mutagenesis, Zhu 
and Chakraborti identified that residues 344, 392, 426, 
431, 479, 480, 487, 488 and 494 are important for the 
binding of RBD with ACE2 and SARS-CoV antibody 
[54,55], In RBD, there are eight newly identified sites 
(333, 336, 353, 391, 440, 442, 457, 459, and 462) which 
have not been proved to be critical for RBD and ACE2 
interaction. Furthermore, there are ~60 sites to be under 
positive selection beside RBD. Although evaluation of 
every observed site under positive selection by reverse 
genetics would not be realistic or feasible, generation and 
evaluation of mutant viruses based on sites located within 
or adjacent to functional domains could provide clues for 
the genetic aetiology of SARS adaptation to new hosts and 
emergence. 

The first two thirds of coronavirus genome encode two 
large polyproteins: Pla and Plab, which are cleaved by 
virus-encoded proteinases (PLpro and 3CLpro) into 16 
non-structure proteins (nspl-nspl6) playing important 
role during coronavirus replication. Because the Plab is 
too big (~21 kb), we analyzed four most important 
domains related to viral replication: PLpro, 3CLpro, RdRp 
and Hel, which correspond to nsp3, nsp5, nspl2 and 
nspl3, respectively [17,43,44], Our results revealed that, 
unlike the adaptive evolutionary pattern of S protein, 
these replicase domains did not experience positive selec¬ 
tion in bat and palm civet, but underwent strong positive 
selection in human patient. Moreover the evidence of pos¬ 
itive selection is stronger in the later phases than that 
observed in the early and middle phases (data no shown). 
Furthermore, our analysis using HYPHY package observed 
very weak positive selection in the HP03 group but not 
among the other groups (Table 3). These results to some 
extent differ from the observations derived from several 
previous studies. Using pairwise analysis of the d M /d s , 
Song et al. found that the average co ratio of S protein for 
the early phase was larger than that for the middle phase, 
which in turn was larger than the ratio for the late phase. 
A similar pattern was found in ORFla and ORF3a but not 
in ORFlb and nsp3, which were suggested under purify- 
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ing selection during the whole course of the epidemic. 
They identified over 200 single-nucleotide variations 
(SNVs) and inferred the importance of some SNVs on 
SARS-CoV adaptive evolution [21], Zhang et al. also inves¬ 
tigated the adaptive evolution of the S protein employing 
the site specific model, but they did not observe any pos¬ 
itive selection in RdRp, Hel or nsp3 [21,22]. The best 
explanation for this apparent discrepancy is that the meth¬ 
ods used in their studies were more conservative than the 
branch-site model used in our study, and hence were not 
able to detect positively selected changes among the 
highly conserved genes. Alternatively, it may be due to the 
use of concatenating analysis in our study, which can pro¬ 
vide additional information due to the compiling of more 
dissimilar sequences for datasets. For the replicase genes 
of bat and civet isolates, there was no sign of positive 
selection. This is probably due to the following reasons: 1) 
the civet isolates were collected within a very short time 
period and thus there was not enough time for adaptive 
evolution; 2) civet cells are very suitable for SARS-CoV 
replication which may imply that the civet is a perfect 
intermediate host for SARS-CoV; and 3) the bat isolates 
might have completely adapted to their hosts and hence 
were under no further selective pressure for evolution. 

As to the 3'-end ORFs, the most diversifying selection hap¬ 
pened in the middle and late phases of the SARS epidemic 
in 2003-2004. No positive selection was found in the 
BSL, PC03 and FIPEM groups. When the isolates from two 
epidemics of 2003 and 2004 were investigated together, 
12.2% sites in these 7 ORFs were shown to be under pos¬ 
itive selection (Table 5). In addition to a few sites identi¬ 
fied at p > 90% level, many sites are inferred to be under 
positive selection at 70-80% posterior probability (see 
additional file 3B). Most of these sites are located in 
ORF3a, E, M and ORF6, implying these four genes may 
play a more important role for SARS-CoV adaptation in a 
new host. Our results based on the REL method showed 
weak positive selection in HPML group, but not in other 
groups (Table 3). The failure to identify specific sites 
under positive selection could be due to weak and thus 
undetected positive selection. These results suggested that 
the 3'end-ORFs underwent positive selection after SARS- 
CoV spilled out into civet and human populations, and 
adaptive evolution mainly happened in the middle and 
later phases in 2003. Song et al. previously suggested that 
the 3a protein evolved adaptively as S protein [21], By 
estimating mutation rates, Zhao et al. suggested that the 
non-synonymous substitution rates were comparatively 
high in E, M and N [56]. ORF3a encodes a protein of 274 
amino acids. A recent study indicated that the 3 a protein 
forms a potassium sensitive channel that may promote 
virus release and may be important for modulating 
expression of S on the cell surface [16,57], 3a protein also 
interacts with the structural proteins S, E, M [57,58], 


Therefore, amino acid changes in 3 a protein might be nec¬ 
essary to maintain the interaction between 3a and other 
proteins. E and M protein play a pivotal role in viral mor¬ 
phogenesis, assembly and budding. Co-expression of E 
and M was shown to produce virus-like particles, roughly 
the same size and shape as virions [59]. N protein is 
important for viral packaging which is the first step in the 
assembly of infectious SARS viruses [60], Thus the amino 
acid changes in these three structural proteins may be crit¬ 
ical for virus assembly in the new host. SARS-CoV ORF 6 
protein can enhance the virulence of attenuated murine 
coronavirus (MFTV) [61], as well as stimulate cellular 
DNA synthesis [62], A recent study showed that ORF 6 
protein inhibited both interferon synthesis and signaling 
[63]. These findings suggested that ORF 6 may have a role 
in enhancing virus replication or assembly. ORF 7a pro¬ 
tein inhibits cellular protein synthesis and blocks cell 
cycle progression at G0/G1 phase, suggesting that 7a may 
play important roles in the life cycle of SARS-CoV and the 
pathogenesis induced by SARS-CoV [64,65], The function 
of other accessory proteins remains to be determined. 
Overall, these findings suggest that the 3'-end ORFs play 
important roles for SARS-CoV replication, assembly and 
release. Collectively, amino acid changes in these proteins 
could play a role in modulating the host switch of SARS- 
CoV. 

Conclusion 

We systematically analyzed the individual SARS-CoV pro¬ 
teins important for virus entry, replication and assembly. 
The results suggested that SARS-CoVs experienced a step¬ 
wise adaptation to humans. In palm civets and humans 
during the early and middle epidemic phases, virus entry- 
mediating protein S experienced strong positive selection. 
In contrast, the replicase proteins experienced positive 
selection only in human patients but not in palm civets, 
implying that the palm civet is a suitable intermediate 
host for SARS-CoV replication. The proteins involved in 
virus assembly and release mainly underwent positive 
selection during the middle and later epidemic phases. 
These results highlight the spectacular dynamics of SARS- 
CoV evolution in a narrow time window, period of epi¬ 
demic, support the zoonotic origin of SARS and suggest 
that some amino acid sites may be critical for viral adap¬ 
tation in different hosts. Collectively, these results suggest 
that the development of SARS-CoV reverse genetics system 
will facilitate further molecular and/or epidemiological 
investigations to elucidate role of adaptive virus evolution 
in future emergence events. 
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